204 research outputs found
Bioinformàtica
La recerca en biologia no es pot entendre avui sense la computació. A causa, sobretot,
del desenvolupament de les tecnologies genòmiques, la biologia ha passat en molt poc
temps, de ser una ciència en la qual l'esforç humà s'orientava principalment envers l'obtenció
d'unes poques dades, a ser una ciència que genera un volum enorme de dades sense
pràcticament intervenció humana. L'esforç de l'investigador s'ha desplaçat, en conseqüència,
de la producció a l'anàlisi de les dades. I és en aquest desplaçament en què els mètodes
informàtics tenen un paper essencial, tant en la planificació dels experiments com en la
seva execució i, sobretot, en l'emmagatzematge i anàlisi dels resultats. Aquests mètodes
configuren una nova disciplina científica, que anomenem bioinformàtica. En aquest article
repassarem, des d'una perspectiva històrica, els fonaments d'aquesta disciplina, que s'articulen
al voltant del concepte, entès de manera molt genèrica, de alineament i similitud entre
seqüències.Nowadays, research in biology can not be understood without computation. Due to the
development of the genomic technologies, biology has been transformed in a very short
period of time, from being a science in which the human effort was mainly oriented towards
data gathering to being a science that generates a huge volume of data with little
(or no) human intervention. The effort of researchers has, consequently, moved away from
data production towards data analysis. Computational methods play an essential role to
cope with this transformation: in the planning of the experiments, as well as in their execution,
and, especially, in the storage and analysis of their results. These methods configure a new scientific discipline named bioinformatics. In this article we review from a historical
perspective the foundations of this discipline, which articulate around the generic concept
of sequence alignment and similarity
Bioinformática ¿una ciencia sin científicos?
El Proyecto Genoma Humano ha catalizado una presencia sin precedentes de la investigación en biología en los medios de comunicación. Este impacto mediático no es gratuito. El conocimiento de la secuencia de nucleótidos del genoma humano y de la secuencia de aminoácidos de las proteínas codificadas en ese genoma tendrá, se dice, un impacto extraordinario en la medicina, la agricultura y en muchos procesos industriales. Tendrá, en consecuencia, repercursiones económicas, sociales y quizás, incluso, políticas. En definitiva afectará profundamente nuestras vidas y es lógico que despierte nuestro interés. = The Human Genome Project has promoted an unprecedented presence of information on biological research in the media. This is not a gratuitous impact. It is widely believed that the accrued knowledge on human genome nucleotide sequences and on amino acid sequences of proteins codified by our genome will have an exceptional impact on medical sciences, agricultural sciences and many industrial processes. That is, it will cause financial, social and perhaps even political repercussions. In other words, it will deeply affect our lives, and thus is worthy of our interest
Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO!
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neu- roimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimag- ing data
Discovery of Cancer Driver Long Noncoding RNAs across 1112 Tumour Genomes: New Candidates and Distinguishing Features
Long noncoding RNAs (lncRNAs) represent a vast unexplored genetic space that may hold missing drivers of tumourigenesis, but few such "driver lncRNAs" are known. Until now, they have been discovered through changes in expression, leading to problems in distinguishing between causative roles and passenger effects. We here present a different approach for driver lncRNA discovery using mutational patterns in tumour DNA. Our pipeline, ExInAtor, identifies genes with excess load of somatic single nucleotide variants (SNVs) across panels of tumour genomes. Heterogeneity in mutational signatures between cancer types and individuals is accounted for using a simple local trinucleotide background model, which yields high precision and low computational demands. We use ExInAtor to predict drivers from the GENCODE annotation across 1112 entire genomes from 23 cancer types. Using a stratified approach, we identify 15 high-confidence candidates: 9 novel and 6 known cancer-related genes, including MALAT1, NEAT1 and SAMMSON. Both known and novel driver lncRNAs are distinguished by elevated gene length, evolutionary conservation and expression. We have presented a first catalogue of mutated lncRNA genes driving cancer, which will grow and improve with the application of ExInAtor to future tumour genome projects
Enhancers with tissue-specific activity are enriched in intronic regions
Tissue function and homeostasis reflect the gene expression signature by which the combination of ubiquitous and tissue-specific genes contribute to the tissue maintenance and stimuli-responsive function. Enhancers are central to control this tissue-specific gene expression pattern. Here, we explore the correlation between the genomic location of enhancers and their role in tissue-specific gene expression. We find that enhancers showing tissue-specific activity are highly enriched in intronic regions and regulate the expression of genes involved in tissue-specific functions, whereas housekeeping genes are more often controlled by intergenic enhancers, common to many tissues. Notably, an intergenic-to-intronic active enhancers continuum is observed in the transition from developmental to adult stages: the most differentiated tissues present higher rates of intronic enhancers, whereas the lowest rates are observed in embryonic stem cells. Altogether, our results suggest that the genomic location of active enhancers is key for the tissue-specific control of gene expression
From identification to validation to gene count
The current GENCODE gene count of ~ 30,000, including 21,727 protein-coding and 8,483 RNA genes, is significantly lower than the 100,000 genes anticipated by early estimates. Accurate annotation of protein-coding and non-coding genes and pseudogenes is essential in calculating the true gene count and gaining insight into human evolution.
As part of the GENCODE Consortium, the HAVANA team produces high quality manual gene annotation, which forms the basis for the reference gene set being used by the ENCODE project and provides a rich annotation of alternative splice variants and assignment of functional potential. However, the protein-coding potential of some splice variants is uncertain and valid splice variants can remain unannotated if they are absent from current cDNA libraries. Recent technological developments in sequencing and mass spectrometry have created a vast amount of new transcript and protein data that facilitate the identification and validation of new and existing transcripts, while harboring their own limitations and problems
Evolution of selenophosphate synthetases: emergence and relocation of function through independent duplications and recurrent subfunctionalization
Selenoproteins are proteins that incorporate selenocysteine (Sec), a nonstandard amino acid encoded by UGA, normally a stop codon. Sec synthesis requires the enzyme Selenophosphate synthetase (SPS or SelD), conserved in all prokaryotic and eukaryotic genomes encoding selenoproteins. Here, we study the evolutionary history of SPS genes, providing a map of selenoprotein function spanning the whole tree of life. SPS is itself a selenoprotein in many species, although functionally equivalent homologs that replace the Sec site with cysteine (Cys) are common. Many metazoans, however, possess SPS genes with substitutions other than Sec or Cys (collectively referred to as SPS1). Using complementation assays in fly mutants, we show that these genes share a common function, which appears to be distinct from the synthesis of selenophosphate carried out by the Sec- and Cys- SPS genes (termed SPS2), and unrelated to Sec synthesis. We show here that SPS1 genes originated through a number of independent gene duplications from an ancestral metazoan selenoprotein SPS2 gene that most likely already carried the SPS1 function. Thus, in SPS genes, parallel duplications and subsequent convergent subfunctionalization have resulted in the segregation to different loci of functions initially carried by a single gene. This evolutionary history constitutes a remarkable example of emergence and evolution of gene function, which we have been able to trace thanks to the singular features of SPS genes, wherein the amino acid at a single site determines unequivocally protein function and is intertwined to the evolutionary fate of the entire selenoproteome
The Origins and the Biological Consequences of the Pur/Pyr DNA·RNA Asymmetry
We analyze the physical origin and the chemical and biological consequences of the asymmetry that occurs in DNA·RNA hybrids when the purine/pyrimidine (Pu/Py) ratio is different in the DNA and RNA strands. When the DNA strand of the hybrid is Py rich, the duplex is much more stable, rigid, and A-like than when the DNA strand is Pu rich. The origins of this dramatic asymmetry are double: first, the apparently innocuous substitution dT → rU produces a significant decrease in stacking, and second, backbone distortions are larger for DNA(Pu)·RNA(Py) hybrids than for the mirror RNA(Pu)·DNA(Py) ones. The functional impact of the structural and dynamic asymmetry in the biological activities of hybrids is dramatic and can be used to improve the efficiency of antisense-type strategies on the basis of the degradation of hybrids by RNase H or gene editing using CRISPR-Cas9 technology
The effects of death and post-mortem cold ischemia on human tissue transcriptomes
Post-mortem tissues samples are a key resource for investigating patterns of gene expression. However, the processes triggered by death and the post-mortem interval (PMI) can significantly alter physiologically normal RNA levels. We investigate the impact of PMI on gene expression using data from multiple tissues of post-mortem donors obtained from the GTEx project. We find that many genes change expression over relatively short PMIs in a tissue-specific manner, but this potentially confounding effect in a biological analysis can be minimized by taking into account appropriate covariates. By comparing ante- and post-mortem blood samples, we identify the cascade of transcriptional events triggered by death of the organism. These events do not appear to simply reflect stochastic variation resulting from mRNA degradation, but active and ongoing regulation of transcription. Finally, we develop a model to predict the time since death from the analysis of the transcriptome of a few readily accessible tissues.Peer ReviewedPostprint (published version
- …